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Abstract 

This paper addresses the problem of separating spectral sources which are linearly mixed with 
unknown proportions. The main difficulty of the problem is to ensure the full additivity (sum-to- 
one) of the mixing coefficients and non-negativity of sources and mixing coefficients. A Bayesian 
estimation approach based on Gamma priors was recently proposed to handle the non-negativity 
constraints in a linear mixture model. However, incorporating the full additivity constraint requires 
further developments. This paper studies a new hierarchical Bayesian model appropriate to the 
non-negativity and sum-to-one constraints associated to the sources and the mixing coefficients 
of linear mixtures. The estimation of the unknown parameters of this model is performed using 
samples obtained with an appropriate Gibbs algorithm. The performance of the proposed algorithm 
is evaluated through simulation results conducted on synthetic mixture data. The proposed approach 
is also applied to the processing of multicomponent chemical mixtures resulting from Raman 
spectroscopy. 

Index Terms 

Spectral source separation, non-negativity constraint, full additivity constraint, Bayesian infer- 
ence, Markov chain Monte Carlo methods. 
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I. Introduction 

Blind source separation (BSS) is a signal processing problem arising in many applications 
where one is interested by extracting signals that are observed as mixtures [1]. Pioneering 
works dealing with this problem have focused on the mutual statistical independence of 
the sources, which led to the well known independent component analysis (ICA) [2]-[5]. 
However, when the sources and the mixing coefficients have to satisfy specific constraints 
the resulting constrained source separation problem becomes more compUcated. Therefore 
appropriate separation algorithms have to be developed to handle these constraints. When 
the sources are actually independent, ICA provides estimates of the sources and mixing 
coefficients which implicitly satisfy these constraints. However, these algorithms, that try 
to maximize the independence between the estimated sources, have not been designed for 
correlated sources. 

Non-negativity is a physical constraint which has retained a growing attention during 
the last decade. For instance, Plumbley and his co-authors have addressed the case of non- 
negative independent sources and proposed the non-negative independent component analysis 
algorithm [6]. The case of both non-negative sources and non-negative mixing coefficients 
has been handled by using non-negative matrix factorization algorithms (NMF) [7] and a 
Bayesian positive source separation algorithm [8]. By adding a source sparsity constraint, a 
method ensuring the sparseness of the sources (referred to as non-negative sparse coding) has 
been presented in [9]. A Bayesian approach allowing one to perform the separation of sparse 
sources has also been proposed in [10] using a T-student distribution. Cauchy Hyperbolic 
priors have been introduced in [11] without considering the non-negativity constraint. 

This paper addresses a source separation problem in the case of linear instantaneous 
mixtures where the source signals are non-negative and the mixing coefficients satisfy non- 
negativity and full additivity constraints. These constraints have been observed in many 
applications. These applications include analytical chemistry for the analysis of kinetic reac- 
tions monitored by spectroscopy [12] or image processing for the analysis of hyperspectral 
images [13]. A Bayesian framework appropriate to constrained source separation problem 
is first proposed. Prior distributions encoding non-negativity and full additivity constraints 
are assigned to the source signals and mixing coefficients. However, the standard Bayesian 
estimators resulting from these priors have no simple closed form expression. As a con- 
sequence, Markov chain Monte Carlo (MCMC) methods are proposed to generate samples 
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according to the full posterior distribution of the unknown parameters. Estimators of the 
mixing coefficients and the source signals are then constructed from these generated samples. 
The paper is organized as follows. Section EI defines a hierarchical Bayesian model (HBM) 
for the addressed constrained source separation problem. In particular, prior distributions are 
introduced such that they are concentrated on a simplex and they satisfy the positivity and 
full additivity constraints. Section IV describes a Gibbs sampling strategy that allows one 
to overcome the computational complexity inherent to this HBM. Simulations conducted on 
synthetic mixture data are presented in Section V. As a consequence, the performance of the 
proposed Bayesian estimation algorithm can be appreciated for constrained source separation 
problems. The interest of the proposed Bayesian approach is also illustrated by the analysis 
of real experimental data reported in Section VI. Conclusions and perspectives are reported 
in Section Vn. 



The linear mixing model studied in this paper assumes that the observed signal is a 
weighted sum of M unknown sources. In the case of spectral mixture data this model can 
be expressed by: 



where y^^ is the observed spectrum at time/spatial index i (i — 1, . . . , N) m \he spectral 
band (j = 1, . . . N is the number of observed spectra, M is the number of mixture 
components and L is the number of spectral bands. The coefficient c, ^ is the contribution 
of the component in the mixture and Cj^j is an additive noise modeling measurement 
errors and model uncertainties. The linear mixing model can be represented by the following 
matrix formulation: 



where the matrices Y = [y,,,] . . e M^><^ C = [c^,m\^ e M^><*^ S = e M^><^ and 



E = e M^^^ contain respectively the observed spectra, the mixing coefficients, the 

spectral sources and the additive noise components. The noise sequences = [ej^i, . . . , ej^i]^ 
(i — 1,. . . ,N) are assumed to be independent and identically distributed (i.i.d.) according 
to zero-mean Gaussian distributions with covariance matrices oI^^l, where 1^ is the L x L 
identity matrix. Note that this last assumption implies that the noise variances are the same 
in all the spectral bands. This reasonable assumption has been considered in many recent 



II. Problem statement 




(1) 



Y = CS + E, 



(2) 
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works including [8] and [11]. It could be relaxed at the price of increasing the computational 
complexity of the proposed algorithm [14]. 

In the framework of spectral data analysis, it is obvious from physical considerations 
that both the mixing coefficients and the source signals satisfy the following non-negativity 
constraints: 

Sm,j>0 mdci^rn>0, \/{i,m,j). (3) 

Moreover, in many applications, the mixing coefficients have also to satisfy the full additivity 
constraint^ : 

M 

J]q,^ = 1 Vi (4) 

m=l 

These applications include spectroscopy for the analysis of kinetic reactions [15] and hyper- 
spectral imagery where the mixing coefficients correspond to abundance fractions [16]. 

The separation problem addressed in this paper consists of jointly estimating the abun- 
dances and the spectral sources under the non- negativity and the full additivity constraints. 
There are several methods allowing one to address the estimation problem under non-negativity 
constraint. These methods include NMF methods [17] and its variants [1]. From a Bayesian 
point of view an original model was proposed in [8] where Gamma priors are used to encode 
the positivity of both the sources and the mixing coefficients. This paper goes a step further 
by including the additivity of the mixing coefficients in the Bayesian model. Note that this 
constraint allows one to resolve the scale indeterminacy inherent to the linear mixing model 
even if non-negativity constraint is imposed. Indeed, this full additivity constraint enforces 
the £i norm of each concentration vector Cj to be equal to ||cj||^ = Sm=i I'^hml — 1- 

III. Hierarchical Bayesian Model 

The unknown parameter vector for the source separation problem described previously 
is (S,C, cTg) where S and C are the source and concentration matrices and crl = 

((Te,i, • • • ,o"e,Ar)^ contaius the noise variances. Following the Bayesian estimation theory, the 
inference of the unknown parameters from the available data Y is based on the posterior dis- 
tribution / (0|Y), which is related to the observation likelihood / (Y|0) and the parameter 
priors / (0) via the Bayes' theorem: 

/(0|Y)oc/(Y|0) /(0), 

'This condition is also referred to as sum-to-one constraint in tlie literature. 
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where oc means "proportional to". The observation likelihood and the parameters priors are 
detailed in the sequel. 

A. Observation likelihood 

The statistical assumptions on the noise vector and the linear mixing model described 

in (1) allow one to write: 

yi\S,c,,al,r^Af{S''c,,al,lL), (5) 

where = [yi^i, . . . , yi,L]^, Ci = [q^i, . . . , q^m]""" and J\f{-, •) denotes the Gaussian distribu- 
tion. By assuming the mutual independence between the vectors ei, . . . ,ejv, the likelihood 
of Y is: 2 

/ (Y|C, S, al) oc -J-^ exp ( f t^^^) , (6) 

where ||x||2 = (x^x) ^ stands for the standard £2 norm. 

B. Parameter Priors 

1) Concentrations: In order to ensure the non-negativity and additivity constraints, the 
concentrations are assigned a Dirichlet prior distribution. This distribution is frequently used 
in statistical inference for positive variables summing to one. The Dirichlet probability density 
function (pdf) is defined by: 

where 5i, . . . , 5m0 are the Dirichlet distribution parameters, F (•) is the Gamma function and 
1a( ) denotes the indicator function defined on the set A: 

1a(x) = 1, ifxe^; 

(8) 

1a (x) = 0, otherwise. 

According to this prior, the expected value of the spectral source abundance is E[cj m] = 

M 

5m/ Yl ^rn- We assumc here that the abundances are a priori equiprobable (reflecting the 

m=l 

absence of knowledge regarding these parameters) which corresponds to identical parameters 
{5m = 1, Vm — 1, ■ ■ ■ , M}. An interesting reparametrization can be introduced here to handle 
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the full additivity constraint. This reparametrization consists of splitting the concentration 
vectors into two parts^: 

Ci = [aJ,Ci,MY , (9) 

where = [c^i, . . . , Q,m-i] and q^m = 1 — J2m=i ^'i,m- It induces a new unknown parameter 
vector = (A, S, cr^} (the same notation is used for this new parameter vector to avoid 
defining new variables). The proposed prior for eij, i = 1, . . . , A?^ is a uniform distribution on 
the following simplex: 

S= |a,;a,,„>0, Vm = l,...,M-L a,,^ < l| . (10) 

t m=l J 

By assuming a priori mutual independence between the vectors aj, the prior distribution for 
the matrix A = [ai, . . . , ajv]^ reduces to: 

N 

/(A)ocnis(a.). (11) 

1=1 

2) Source signals: To take into account the non-negativity constraint, the two parameter 
Gamma distribution seems to be a good candidate thanks to its flexibility, i.e. the pdf has 

many different shapes depending on the values of its parameters (see [8] for motivations). 
This distribution encodes positivity and covers a wide range of distribution shapes-'. The 
assumption of independent source samples leads to a prior distribution for each spectral 
source expressed as: 

L L 

n te"' i-f^rnSmj) W {Smj)] ■ (12) 

Note that this distribution generalizes the exponential prior presented in [19], [20] and 
has the advantage of providing a wider variety of distributions (see also paragraph V- 
E for additional details regarding the exponential prior). Finally, by assuming the mutual 
independence between the spectral sources, we obtain the following prior distribution for S: 

M 

/(S|a,/3) = l[f{sm\am,(3m), (13) 
where a — [ai, . . . , ckm]^ and /3 = . . . , PmV source hyperparameter vectors. 

^From a practical point of view, it is interesting to note ttiat the component of a, to be discarded will be randomly chosen 
at each iteration of the Algorithm introduced in Section IV. 

more general model would consist of using a mixture of Gamma distributions as in [18]. However, the Gamma 
distribution which leads to a simple Bayesian model has been preferred here for simpUcity. 



m 



r(am) 
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3) Noise variances: Conjugate priors which are here inverse Gamma (IG) distributions 
are chosen for each noise variance (7g ^ [21, App. A]: 

<.k^e-X^(f,|), (14) 

where (a, b) denotes the IG distribution with parameters a and b. Note that choosing con- 
jugate distributions as priors makes the Bayesian analysis easier [22, Chap. 2]. By assuming 
the independence between the noise variances (j| j, i = 1, . . . , iV, the prior distribution of crl 
is: 

N 

/(o-,2|pe,V'e)=n/«k^0- (15) 

i=l 

The hyperparameter will be fixed to pe = 2 whereas ijje is an adjustable hyperparameter 
as in [23]. 

C. Hyperparameter priors 

The hyperparameter vector associated with the prior distributions previously introduced is 
# = {a,/3,V'e}- Obviously, the BSS performances depend on the values of these hyperpa- 
rameters. In this paper, we propose to estimate them within a fully Bayesian framework by 
assigning them non-informative prior distributions. This naturally introduces a second level 
of hierarchy within the B ayes' paradigm, resulting in a so-called hierarchical Bayesian model 
[24, p. 299]. 

1) Source hyperparameters: Conjugate exponential densities with parameters A^^ have 
been chosen as prior distributions for the hyperparameters [21, App. A]: 

am\Km ~ ^ i^'aj ■ (16) 

Conjugate Gamma distributions with parameters {af3^,(3p^) have been elected as prior dis- 
tributions for the hyperparameters [21, App. A]: 

Pm I (^iSm ,/3/3m'^G {ap^ ,/3/3^)- (17) 

The fixed hyperparameters {otp^, I3ji^, Aq^}^ have been chosen to obtain flat priors, i.e. with 
large variances: ajj^ = 2, = lO^^ and = 10"^ 
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2) Noise variance hyperparameters: The prior for ip^ is a non-informative Jeffreys' prior 
which reflects the lack of knowledge regarding this hyperparameter: 



f{iJe) OC ^lM+(^e)- 
Ye 



(18) 



Assuming the independence between the hyperparameters, the prior distribution of the hy- 
perparameter vector $ = {a, /3, V'e} can be written as: 



M 



exp {-XarnO^m) 1r+ (Oim)] 



m=l 
M 



(19) 



- '- -' Ye 



m=l 



D. Posterior distribution of © 

The posterior distribution of the unknown parameter vector = {A, S, cTg) can be 
computed from the following hierarchical structure: 

/(0|Y) « J /(Y|0)/(0|*)/(*)d*, (20) 

where / (Y|0) and /(^) have been defined in (6) and (19). Moreover, by assuming the 
independence between A, S and cr^, the following result can be obtained: 

/ (0|*) = / (A) / (S|o-2) / {al\p,, V'e) , (21) 

where / (A), / (SjcTg) and / (cr^|pc, V^c) have been defined previously. This hierarchical 
structure, depicted in the directed acyclic graph (DAG) of Fig. 1, allows one to integrate out 
the hyperparameters V'e and /3 from the joint distribution / (0, ^|Y), yielding: 



TV 



/(A,S,^7,^a|Y) ocj] 



i=l 



ls(a.) 



X 



X 



M 

n 

m=l 
M 

n 

m=l 



r (La^ + ap^ + 1) 



(22) 



n 



r (ar 



The posterior distribution in (22) is clearly too complex to derive the classical Bayesian 
estimators of the unknown parameters, such as the minimum mean square error (MMSE) 
estimator or the maximum a posteriori (MAP) estimator. To overcome the difficulty, it is 
quite common to make use of MCMC methods to generate samples asymptotically distributed 
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Fig. 1. DAG for the parameter priors and hyperpriors (the fixed parameters appear in dashed boxes). 



according to the exact posterior of interest [24]. The simulated samples are then used to 
approximate integrals by empirical averages for the MMSE estimator and to estimate the 
maximum of the posterior distribution for the MAP estimator. The next section proposes a 
Gibbs sampling strategy for the BSS of the spectral mixtures under the positivity and full 
additivity constraints. 

IV. GiBBS SAMPLER 

The Gibbs sampler is an iterative sampling strategy that consists of generating samples 
(denoted ^*^) distributed according to the conditional distribution of each parameter. This 
section describes a Gibbs sampUng strategy generating samples 

asymptotically distributed according to (22). The main steps of the algorithm (denoted as 
Algorithm 1) are detailed from subsection IV-A to subsection IV-C. 



Algorithm 1. Gibbs sampling algorithm for blind spectral source separation 
• Initialization: 

1) sample the hyperparameter ^e°^ from the pdf in (18), 

2) for i — 1, . . . , N, sample the noise variance {5ei}^°^ from the pdf in (14), 

3) for m = 1, . . . , M, sample the hyperparameter 5m'' from the pdf in (16), 

4) for m = 1, . . . , M, sample the hyperparameter f3m from the pdf in (17), 

September 23, 2009 DRAFT 



10 



5) for m = 1, . . . , M, sample the source spectrum from the pdf in (12). 

6) Set t ^ 1, 

• Iterations: for i = 1, 2, . . . , do 

1) for i = 1, . . . , A'^, sample the concentration vector af ^ from the pdf in (25), 

2) sample the hyperparameter ^e*^ from the pdf in (26), 

3) for i = 1, . . . , iV, sample the noise variance {5e,j}*^*^ from the pdf in (27), 

4) for m = 1, . . . , M, sample the hyperparameter am from the pdf in (28), 

5) for m = 1, . . . , M, sample the hyperparameter from the pdf in (29), 

6) for m = 1, . . . , M, sample the source spectrum s^^ from the pdf in (30). 

7) Set t^t + 1. 



A. Generation according to / (A|S, cr^, Y) 

Straightforward computations yield for each observation: 



/ (a* |S,(7g_i,yi) oc exp 



Mi 



iT(aj), 



(23) 



where: 



Mi 



A. 



(ST^f,. - Smu"^)"^ {Yi - Sm) 



(24) 



with u = [1, . . . , 1]^ e R^~^ and where S_m,. denotes the matrix S from which the M*^ 

row has been removed. As a consequence, aj|S, j, is distributed according to a truncated 
Gaussian distribution on the simplex S: 

a, |S,(7j^,yi ~ A/'s(/ii,Ai). (25) 

When the number M of spectral sources is relatively small, the generation of a^ |S,crg j,yi 
can be achieved using a standard Metropolis Hastings (MH) step. By choosing the Gaussian 
distribution j\/'(/Ltj, Aj) as proposal distribution for this MH step, the acceptance ratio of the 
MH algorithm reduces to 1 if the candidate is inside the simplex § and otherwise. For 

higher dimension problems, the acceptance ratio of the MH algorithm can be small, leading 
to poor mixing properties. In such cases, an alternative strategy based on a Gibbs sampler 
can be used (see [25] and [26]). 
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B. Generation according to f (cr^ | A, S, Y) 

To sample according to / (cTg | A, S,Y), it is very convenient to generate samples from 
/ (<Tg, V'e |A, S, Y) by using the two following steps: 

1) Generation according to f (V'e l^e) -^^ S, Y); The conditional distribution is expressed 
as the following IG distribution: 

AKp.-igf'f^-f^^]. (26) 

\ i=l e,iy 

2) Generation according to f {crl |^e, A, S, Y ); After a careful examination of / ((Tg, A, | S, Y) , 
it can be deduced that the conditional distribution of the noise variance in each observation 
spectrum is the following IG distribution: 

2 1/ G ( Pe + L iJe+\\yi-ScJ\f\ 

C. Generation according to f {S\A,al,Y) 

This generation can be achieved thanks to the three following steps, as in [8]. 
1) Generation according to f (a |/3, S, A, <Tg, Y); From the joint distribution / (A, S, crl, a, (3\Y), 
we can express the posterior distribution of (m = 1, . . . , M) as: 

0" 



f {am\Sm,Pm) OC JJ 

j = l L 



'lR+(an^). (28) 



This posterior is not easy to simulate as it does not belong to a known distribution family. 
Therefore, an MH step is required to generate samples am distributed according to (28). 
The reader is invited to consult [8] for more details regarding the choice of the instrumental 
distribution in order to obtain a high acceptance rate for the MH algorithm. 

2) Generation according to / (/3 |q;, S, A, cr^, Y).- Similarly, the posterior distribution 
of the hyperparameter vector (3 can be determined by looking at the joint distribution 
/ (A, S, CTg, a, /3| Y). In this case, the posterior distribution of the individual hyperparameter 
Pm (nT' — ■ ■ ■ 1 is the following Gamma distribution: 

Q\^ + Lam + "a^ , Sm,j + Pamj ■ (29) 
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3) Generation according to / (S |q;, /3, A, cTg, Y); Finally, the posterior distribution of 
the source observed in the spectral band is: 



(30) 



/ (Sm,i l^m, /3m, A, O"^, Y) OC s;^™ 4r+ (s^j) exp 

with 

"m — [l^i=l 0-2 . J ' 

(-m) (31) 
1 ^^7V f^i,m^i j 

where — yi,j — Ylik^m ^hk^kj- The generation of samples distributed according to (30) is 
achieved by using an MH algorithm whose proposal is a positive truncated normal distribution 
[8]. The generation according to the positive truncated Gaussian distribution can be achieved 
thanks to an accept-reject scheme with multiple proposal distributions (see [25], [27], [28] 
for details). 

V. Experimental results with synthetic data 

This section presents some experiments performed on synthetic data to illustrate the per- 
formance of the proposed Bayesian spectral unmixing algorithm. 

A. Mixture synthesis 

The spectral sources have been simulated to get signals similar to absorption spectroscopy 
data. Each spectrum is obtained as a superposition of Gaussian and Lorentzian functionals 
with randomly chosen parameters (location, ampUtude and width) [8]. Figure 2 (left) shows 
an example of M = 3 source signals of L = 1000 spectral bands. For this application, a 
"spectral" band corresponds to a given value of the wavelength A (expressed in nanometers). 
The mixing coefficients have been chosen to obtain evolution profiles similar to component 
abundance variation in a kinetic reaction, as depicted in Figure 2 (top, right). The abundance 
fraction profiles have been simulated for N — 10 observation times, which provides N — 10 
observation spectra. An i.i.d. Gaussian sequence has been added to each observation with 
appropriate standard deviation to have a signal to noise ratio (SNR) equal to 20dB. One 
typical realization of the observed spectra is shown in Figure 2 (bottom, right). 
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Fig. 2. Left: example of M = 3 simulated spectral sources where the a;-axis corresponds to the wavelength expressed in 
nm and the j/-axis corresponds to the absorbance of the spectra. Right, top: abundance evolution profiles. Right, bottom: 
one typical realization of the observed spectra. 



B. Separation with non-negativity and full additivity constraints 

Figure 3 summarizes the result of a Monte Carlo simulation with 100 runs where the 
mixing matrix has been kept unchanged, while new sources and noise sequences have been 
generated at each run. Figure 3-a shows a comparison between the true concentrations (cross) 
and their MMSE estimates (circles) obtained for a Markov chain of A^mc — 1000 iterations 
including A^b-i = 200 bum-in iterations. These estimates have been computed according to 
the MMSE principle (i = 1, . . . , M): 

^. = l^Esf""". (32) 

^ t=l 

where Nr = Nmc — ^b-i is the number of iterations used for the estimation. The estimated 
abundances are clearly in good agreement with the actual abundances and the estimates satisfy 
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Fig. 3. Top: Simulated (dotted) and estimated (continuous line) source spectta. Bottom: Simulated values (cross) and 
MMSE estimates (circles) of the abundances. Error bars indicate the estimated 95% confidence intervals from the simulated 
Markov chain. 



the positivity and full additivity constraints. By comparing figures 2 (left) and 3 (top), it can 
be observed that the source signals have also been correctly estimated. 

It is interesting to note that the proposed algorithm generates samples distributed according 
to the posterior distribution of the unknown parameters / (A, S, (Tg, a|Y). These samples 
can be used to obtain the posterior distributions of the concentrations or the source spectra. 
As an example, typical posterior distributions for two mixing coefficients are depicted in 
figure 4. These posteriors are in good agreement with the theoretical posterior distributions 
in (25), i.e. truncated Gaussian distributions. 

C. Monitoring sampler convergence 

An important issue when using MCMC methods is convergence monitoring. The Gibbs 
sampler detailed in section IV generates random samples ^A(*\ S*^*\ a*^*^^ asymptot- 
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Fig. 4. Left (resp. right): posterior distribution of tlie concentration of the 2" (resp. 3' ) spectral component in the mixture 
observed at index time i = 2 (resp. i = 9). The actual values appear as black bars. 



ically distributed according to the posterior distribution in (22). The quantities of interest, 
i.e. the concentration coefficients and the source spectra, are then approximated by empirical 
averages according to (32). However, two essential parameters have to be tuned: the length 
A^MC of the constructed Markov chain and the length Ny^_i of the bum-in period, i.e. the 
number of simulated samples to be discarded before computing the averages. This section 
reports some works conducted to ensure the convergence of the proposed algorithm and the 
accuracy of the estimation for the unknown parameters. 

First, the burn-in period N\,.i = 200 has been determined thanks to the popular potential 
scale reduction factor (PSRF). The PSRF was introduced by Gehnan and Rubin [29] and has 
been widely used in the signal processing literature (see for instance [30]-[32]). It consists 
of running several parallel Markov chains and computing the following criterion: 

1 



P 



1 ^ 1 m 



(33) 



(Nr-1)W(K)_ 

where W and B are the within and between-sequence variances of the parameter k, re- 
spectively. Different choices for k can be used for our source separation problem. Here, 
we consider the parameters • (i = 1, . . . ,N) as recommended in [33]. Table I shows the 
PSRF obtained for the = 10 observation times computed from M = 10 Markov chains. 
All these values of ^/p confirm the good convergence of the sampler since a recommendation 
for convergence assessment is y/p < 1.2 [34, p. 332]. 

The Markov chain convergence can also be monitored by a graphical supervision of the 
generated samples of the noise variances. As an illustration, the outputs of 10 Markov chains 
for one of the parameter Ug ^ are depicted in figure 5. All the generated samples converge to 
a similar value after a short burn-in period (200 iterations, in this example). 
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TABLE I 

Potential scale reduction factors of a^^i computed from M = 10 Markov chains. 



Obs. index 




Obs. index 


Vp 


1 


1.0048 


2 


1.0013 


3 


1.0027 


4 


0.9995 


5 


1.0097 


6 


1.0078 


7 


1.0001 


8 


0.9994 


9 


1.0080 


10 


1.0288 



10-^ 




fli i» i # il | ii»i> 



100 200 300 

iteration 



Fig. 5. Outputs of M = 10 Markov chains for the parameter a^^^. 



Once the number of bum-in iterations has been fixed, the number of iterations necessary 
to obtain accurate estimates of the unknown parameters via (32) has to be adjusted. This 
paper proposes to evaluate A^^ with appropriate grapliical evaluations (see [35, p. 28] for 
motivations). Figure 6 shows the reconstruction error associated to the different spectra 
defined as: 



elip) 



N 

y.-(c->.S^)) 

1=1 



(34) 



where c/^^ and S^^^ are the MMSE estimates of the abundance vector Cj and the source 
matrix S computed after A^b-i — 200 bum-in iterations and Nj. — p iterations. The number 
of iterations Nr required to compute the empirical averages following the MMSE estimator 
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(32) can be fixed to ensure the reconstruction error is below a predefined threshold. Figure 6 
shows that a number of iterations Nr = 500 is sufficient to ensure a good estimation of the 
quantities of interest C and S. 
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Fig. 6. Evolution of the reconstiuction error with respect to the iteration number (with a bum-in of iVb-i = 200 iterations). 



D. Comparison with other BSS algorithms 

The proposed Bayesian approach has been compared with other standard BSS methods. 
Synthetic mixtures have been processed by the non-negative ICA (NN-ICA) algorithm pro- 
posed by Plumbley and Oja [36], the iterative NMF method described in [7] and the Bayesian 
Positive Source Separation (BPSS) algorithm introduced in [8]. 

All these methods do not include the full additivity constraint. To evaluate the relevance 
of this additional constraint, ad hoc re-scaled versions of these methods have also been 
considered. Simulations have been conducted by applying the 4 algorithms using 100 Monte 
Carlo runs, each run being associated to a randomly generated source. Table II shows the 
normalized mean square errors (NMSEs) for the estimated sources and abundance matrices 
as defined in [37]: 

M II _ . ||2 

NMSE(S) = ^ , 

(35) 

N lip _ - ||2 



NMSE (C) = J2 



I ||2 
i=l 

In addition, the estimation performances have been compared in terms of dissimilarity. 
Denoted diss {■,■), measures how the estimated source spectrum differs from the reference 
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one [38] and is defined by: 

diss (s^, s^) = yT^^coirjs^^^Xif , (36) 

where corr(s^,s^) is the correlation coefficient between and its estimate s^. Conse- 
quently the average dissimilarity over the M sources is reported in Table II. 

TABLE II 

Estimation performance for different BSS algorithms (100 Monte Carlo runs). 

NMSE (S) NMSE (C) Av. Diss (S) Time (min) 



Proposed approach 0.0071 0.0024 13.9 % 44 

BPSS 0.0121 0.0025 13.4 % 45 

re-scaled BPSS 0.0126 0.0023 13.4 % 45 

NN-ICA 0.0613 0.0345 20.0 % 3 

re-scaled NN-ICA 0.0602 0.0384 19.4 % 3 

NMF 0.2109 1.9149 22.6 % 1 

re-scaled NMF 0.0575 0.0496 24.5 % 1 



These results demonstrate that an ad hoc re-scaling of the results obtained by NMF 
techniques is not always an efficient means to improve the estimation performance. Indeed, 
the ad hoc re-scaled version of NMF provides lower MSEs than the corresponding standard 
algorithms. On the other hand, this constraint does not significantly improve the NN-ICA or 
the BPSS algorithms. As far as the Bayesian algorithms are concerned, they clearly provide 
better estimation performance than the non-Bayesian approaches. However, the proposed fully 
constrained algorithm clearly outperforms the two BPSS algorithms, especially regarding the 
source estimation. 

The computation times required by each of these algorithms are also reported in Table n 
for a MATLAB implementation on a 2.2GHz Intel Core 2. This shows that the complexities of 
the proposed method and BPSS algorithms are quite similar and higher than the complexities 
of the NN-ICA and MNF algorithms. This seems to be the price to pay to obtain significantly 
better estimation performances. 
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E. Modified Bayesian models with other source priors 

As it has been mentioned previously, several distributions can be chosen as priors for the 
source spectra, provided these distributions have positive supports. The previous HBM studied 
in section ni-B.2 is based on Gamma distributions as source priors. However, simpler models 
can be obtained for instance by choosing exponential priors with different scale parameters 



2 



f (Sm |(Js,m) °^ n J- { - 1^ ) i^rn,j), (37) 



2 . 
s.m' 



3 

or positive truncated Gaussian distribution with different hidden variances cr, 

j—l ^ s,m \ ^ s,m / 

The resulting Bayesian algorithms are simpler since only one hyperparameter cTg^ has to be 
adjusted for each source. 

For both choices, conjugate IG distributions {^i^) chosen as prior distributions 
for the hyperparameters CTg^, m — 1, . . . , M. After integrating out the hyperparameter vector 
$ = {^jje, cTg}, the posterior distribution in (22) can be expressed as: 

M N 

/ (A, S, o-^ I Y) oc J] T (s^, p„ A) (sm) n 



m=l i=l 



1 / lly.-S'^c.lf ',^ 
exp — ^ ^ l§(ai 



(39) 



The scalar T (s^, Ps, V's) depends on the prior distribution used for the source spectra: 



L+ps 



[V's + ||Srn|li] ^ , for exponential priors, 

T{Sm,Ps:A)=S 21-^ ^^^^ 

[V^s + llsmiy ^ ) for truncated Gaussian priors. 
In the Gibbs sampling strategy presented in section IV, the generation according to / (S | A, cr 
in subsection IV-C is finally achieved using the following two steps: 
• generation according to / (fTg |S, A, <Tg, Y): 



where 6 = 1 for the exponential prior and h — 2 otherwise, 
• generation according to / (S Icr^, A, cr^, Y) 

\al A, alY - U+ (A^, 5^1^) , (42) 

where and 5^, similar to (31), are derived following the model in [8]. 
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Table III reports the NMSEs (computed from 100 Monte Carlo runs following (35)) for the 
sources and concentration matrices estimated by the different Bayesian algorithms. The results 
are significantly better when employing the Gamma distribution, which clearly indicates that 
the Gamma prior seems to be the best choice to model the distribution of the sources when 
analyzing spectroscopy data. 

TABLE III 

NMSE FOR DIFFERENT SOURCE PRIORS (100 MONTE CARLO RUNS). 





Gamma 


Truncated Gaussian 


Exponential 


NMSE (S) 


0.0071 


0.0269 


0.0110 


NMSE (C) 


0.0024 


0.0089 


0.0029 



VI. Separation of chemical mixtures monitored by Raman spectroscopy 

Calcium carbonate is a chemical material used commercially for a large variety of ap- 
plications such as filler for plastics or paper. Depending on operating conditions, calcium 
carbonate crystallizes as calcite, aragonite or vaterite. Calcite is the most thermodynamically 
stable of the three, followed by aragonite or vaterite. Globally, the formation of calcium 
carbonate by mixing two solutions containing respectively calcium and carbonate ions takes 
place in two well distinguished steps. The first step is the precipitation one. This step is 
very fast and provides a mixture of calcium carbonate polymorphs'^. The second step (a slow 
process) represents the phase transformation from the unstable polymorphs to the stable 
one (calcite). The physical properties of the crystallized product depend largely on the 
polymorphic composition, so it is necessary to quantify these polymorphs when they are 
mixed. Several techniques based on infrared spectroscopy (IR), X ray-diffraction (XRD) or 
Raman spectroscopy (RS) can be used to determine the composition of CaCOa polymorph 
mixtures. However, contrary to XRD and IR, RS is a faster method since it does not require a 
sample preparation and is a promising tool for an online polymorphic composition monitoring. 
In our case, the crystallization process of calcium carbonate is carried out in 5mol/L NaCl 

''The ability of a chemical substance to crystallize with several types of structures, depending on a physical parameter, 
such as temperature, is known as polymorphism. Each particular form is said a polymorph. 
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solutions, which correspond to a real industrial situation. Under the industrial conditions, the 
calcite is the desired product. 

The main purpose of this experiment is to show how the proposed constrained BSS method 
can be used for processing Raman spectroscopy data to study the relation between polymorphs 
and temperature and to explore favorable conditions for calcite formation in saline solutions. 

A. Mixture preparation and data acquisition 

Calcium chloride and sodium carbonate separately dissolved in sodium chloride solutions 
of the same concentration (5mol/L) were rapidly mixed to precipitate calcium carbonate. 
A lOOmL solution containing 0.625M of Na2C03 and 5M of NaCl was added to a 2.5L 
solution containing 0.025M of CaCl2 and 5M of NaCl (the precipitation is carried out under 
stoichiometric conditions). A preliminary investigation detailed in [39] suggested that the 
temperature and the aging time are the most important factors that can affect the polymorphic 
composition. Therefore the experiments were operated in a temperature range between 20C 
and 70C and retaining several aging times of the precipitated mixture. A sample was collected 
2 minutes after the beginning of the experiment to determine the polymorphic composition 
at the end of the precipitation step. Then, samples were collected at regular time intervals to 
follow the polymorph transformation. 

Raman Spectra were collected on a Jobin-Yvon T64000 spectrometer equipped with an 
optical microscope, a threefold monochromator, and a nitrogen-cooled CCD camera. The 
excitation was induced by a laser beam of argon Spectra Physic Laser Stability 2017 at a 
wavelength of 514. 5nm. The beam was focused using a long-frontal x50 objective (numerical 
aperture 0.5) on an area of about 3//m^. The laser power on the sample was approximately 
20mW and the acquisition time was 1 minute. The spectral resolution was 3cm~^, with a 
wavenumber precision better than lcm~^. The Raman spectra were collected at five points, 
which were randomly distributed throughout the mixture. The average of all spectra was con- 
sidered as the Raman spectrum of the corresponding mixture for the considered temperature 
value and aging time. Raman spectra were collected 2 minutes after the beginning of the 
experiment for various temperatures ranging between 20C and 70C in order to determine 
the influence of temperature on the polymorph precipitation. Moreover for each temperature, 
Raman spectra were collected at regular time intervals for monitoring phase transformation. 
Finally, a total of = 37 Raman spectra of L = 477 wavelengths have been obtained. 
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B. Data preprocessing 

The Raman spectra of the polymorph mixture are firstly processed using a background 
removal approach proposed in [40]. In this method, the baseline is represented by a polyno- 
mial whose parameters are estimated by minimizing a truncated quadratic cost function. This 
method requires the specification of the polynomial order and the threshold of the quadratic 
cost function truncation. This method was applied for each spectrum separately with a fifth 
order polynomial and a threshold chosen by trial and error. Figure 7 shows the Raman spectra 
at the beginning of the phase transformation step, after background removal. 

20000 
18000 
16000 
„ 14000 

■f; 12000 

I 10000 

§ 8000 
E 

S. 6000 
4000 
2000 


50 100 160 200 250 300 350 400 450 
wavenumber (cm'^) 

Fig. 7. Mixture spectra at the beginning of the phase transformation. 



C. Polymorph mixture separation under non-negativity and full additivity contraints 

The number of sources to be recovered is fixed to M = 3 according to the prior knowledge 
on the mixture composition. The iteration number is fixed to 1000 iterations where the first 
200 samples are discarded since they correspond to the bum-in period of the Gibbs sampler. 
Figure 8 illustrates the estimated spectra using the proposed approach incorporating the non- 
negativity and the full additivitty constraints. 

From a spectroscopic point of view and according to the positions of the vibrational peaks, 
the identification of the three components is very easy: the first source corresponds to Calcite, 
the second spectrum to Aragonite and the third one to Vaterite. A measure of the dissimilarity 
between the estimated spectra and the measured pure spectra of the three components gives 
4.56% for Calcite, 0.65% for Aragonite and 4.76% for Vaterite. These results show that the 
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proposed method can be applied successfully without imposing any prior information on the 
shape of the pure spectra. 
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Fig. 8. Estimated sources. 

The evolution of the polymorph proportions versus temperature is shown in Fig. 9. Pure 
Vaterite is observed at 20C and a quite pure Aragonite is obtained at 60C. However, between 
20C and 60C ternary mixtures are observed. The content of Calcite is maximal at 40C. Let us 
now consider the phase transformation evolution at this temperature value. The concentration 
profile versus precipitation time at 40C is reported in figure 10. At the beginning of the phase 
transformation (2 minutes), the ternary mixture is composed of 50% Vaterite, 35% Aragonite 
and 15% Calcite. After 2 hours, the Vaterite is transformed to Aragonite and Calcite. After 
7 hours, Vaterite and Aragonite are almost totally transformed to calcite. So, aging time 
promotes the formation of Calcite which is in agreement with some results reported in the 
Uterature [41], [42]. 

D. Polymorph Mixture analysis using other BSS algorithms 

The dataset resulting from this experiment is also used to compare the performances of 
standard BSS methods taking into account the non-negativity constraint and their re-scaled 
versions ensuring the full additivity constraint. Table IV summarizes the performances of the 
considered separation algorithms in terms of normalized mean square errors, dissimilarity 
measures and computation times. It can be noticed that the proposed approach provides source 
estimates with better accuracy than the other methods. In addition to the good estimation 
quality, the second advantage of the proposed method is its ability to scale the sources during 
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Fig. 9. Three component abundances at the beginning of the phase transformation for different temperature values. 
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Fig. 10. Evolution of the three component abundances for T = 40C. 



the estimation algorithm. Thus it does not require any post-processing of the estimation 
results. However, as previously highlighted, the price to pay for having such results is the 
computational times required by the proposed MCMC-based estimation method. 

VII. Conclusion 

This paper studied Bayesian algorithms for separating linear mixtures of spectral sources 
under non-negativity and full additivity constraints. These two constraints are required in 
some applications such as hyperspectral imaging and spectroscopy to get meaningful solu- 
tions. A hierarchical Bayesian model was defined based on priors ensuring the fulfillment 
of the constraints. Estimation of the sources as well as the mixing coefficients was then 
performed by using samples distributed according to the joint posterior distribution of the 
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TABLE IV 

Estimation performance for different BSS algorithms on real spectroscopic data. 





NMSE(S) 


Diss(S) 


Time (sec) 


Proposed approach 


0.0072 


3.34 


146 


BPSS 


0.0118 


4.87 


205 


re-scaled-BPSS 


0.0124 


4.87 


205 


NNICA 


0.1007 


11.82 


29 


re-scaled NNICA 


0.3996 


11.82 


29 


NMF 


0.0093 


4.25 


26 


re-scaled NMF 


0.0109 


4.25 


26 



unknown model parameters. A Gibbs sampler strategy was proposed to generate samples 
distributed according to the posterior of interest. The generated samples were then used to 
estimate the unknown parameters. The performance of the algorithm was first illustrated by 
means of simulations conducted on synthetic signals. The application to the separation of 
chemical mixtures resulting from Raman spectroscopy was finally investigated. The proposed 
Bayesian algorithm provided very promising results for this application. Particularly, when the 
computational times is not a study constraint, the proposed method clearly outperforms other 
standard NMF techniques, which can give approximative solutions faster. Perspectives include 
the development of a similar methodology for unmixing hyperspectral images. Some results 
were already obtained for the unmixing of known sources. However, the joint estimation of 
the mixing coefficients (abundances) and the sources (endmembers) is a still an open and 
challenging problem. 
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